interaction block
Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
Hurtado, Juana Valeria, Mohan, Rohit, Valada, Abhinav
Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hsi-adapter.cs.uni-freiburg.de.
A Proof of Theorem 2
We prove the universal approximation theorem by showing the equivalence of TFN and our model. Complex spherical harmonics are related to Clebsch-Gordan coefficients via [51, 3.7.72] We can therefore adapt Eq. (2) by substituting C To see this, we look at the result's real component null [ H To prove this theorem we first introduce a proposition by Villar et al. [57]. GemNet's variance varies strongly between layers and increases significantly after each block without scaling factors (top). We use 4 stacked interaction blocks and an embedding size of 128 throughout the model.
A Proof of Theorem 2
We prove the universal approximation theorem by showing the equivalence of TFN and our model. Complex spherical harmonics are related to Clebsch-Gordan coefficients via [51, 3.7.72] We can therefore adapt Eq. (2) by substituting C To see this, we look at the result's real component null [ H To prove this theorem we first introduce a proposition by Villar et al. [57]. GemNet's variance varies strongly between layers and increases significantly after each block without scaling factors (top). We use 4 stacked interaction blocks and an embedding size of 128 throughout the model.
Fourier Basis Mapping: A Time-Frequency Learning Framework for Time Series Forecasting
Yang, Runze, Cao, Longbing, You, Xin, Fang, Kun, Li, Jianxun, Yang, Jie
The integration of Fourier transform and deep learning opens new avenues for time series forecasting. We reconsider the Fourier transform from a basis functions perspective. Specifically, the real and imaginary parts of the frequency components can be regarded as the coefficients of cosine and sine basis functions at tiered frequency levels, respectively. We find that existing Fourier-based methods face inconsistent starting cycles and inconsistent series length issues. They fail to interpret frequency components precisely and overlook temporal information. Accordingly, the novel Fourier Basis Mapping (FBM) method addresses these issues by integrating time-frequency features through Fourier basis expansion and mapping in the time-frequency space. Our approach extracts explicit frequency features while preserving temporal characteristics. FBM supports plug-and-play integration with various types of neural networks by only adjusting the first initial projection layer for better performance. First, we propose FBM-L, FBM-NL, and FBM-NP to enhance linear, MLP-based, and Transformer-based models, respectively, demonstrating the effectiveness of time-frequency features. Next, we propose a synergetic model architecture, termed FBM-S, which decomposes the seasonal, trend, and interaction effects into three separate blocks, each designed to model time-frequency features in a specialized manner. Finally, we introduce several techniques tailored for time-frequency features, including interaction masking, centralization, patching, rolling window projection, and multi-scale down-sampling. The results are validated on diverse real-world datasets for both long-term and short-term forecasting tasks with SOTA performance.
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
Zhou, Yuanchang, Hu, Siyu, Wang, Chen, Wang, Lin-Wang, Tan, Guangming, Jia, Weile
Graph neural network universal interatomic potentials (GNN-UIPs) have demonstrated remarkable generalization and transfer capabilities in material discovery and property prediction. These models can accelerate molecular dynamics (MD) simulation by several orders of magnitude while maintaining \textit{ab initio} accuracy, making them a promising new paradigm in material simulations. One notable example is Crystal Hamiltonian Graph Neural Network (CHGNet), pretrained on the energies, forces, stresses, and magnetic moments from the MPtrj dataset, representing a state-of-the-art GNN-UIP model for charge-informed MD simulations. However, training the CHGNet model is time-consuming(8.3 days on one A100 GPU) for three reasons: (i) requiring multi-layer propagation to reach more distant atom information, (ii) requiring second-order derivatives calculation to finish weights updating and (iii) the implementation of reference CHGNet does not fully leverage the computational capabilities. This paper introduces FastCHGNet, an optimized CHGNet, with three contributions: Firstly, we design innovative Force/Stress Readout modules to decompose Force/Stress prediction. Secondly, we adopt massive optimizations such as kernel fusion, redundancy bypass, etc, to exploit GPU computation power sufficiently. Finally, we extend CHGNet to support multiple GPUs and propose a load-balancing technique to enhance GPU utilization. Numerical results show that FastCHGNet reduces memory footprint by a factor of 3.59. The final training time of FastCHGNet can be decreased to \textbf{1.53 hours} on 32 GPUs without sacrificing model accuracy.
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions
Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, Klaus-Robert Müller
Deep learning has the potential to revolutionize quantum chemistry as it is ideally suited to learn representations for structured data and speed up the exploration of chemical space. While convolutional neural networks have proven to be the first choice for images, audio and video data, the atoms in molecules are not restricted to a grid. Instead, their precise locations contain essential physical information, that would get lost if discretized. Thus, we propose to use continuousfilter convolutional layers to be able to model local correlations without requiring the data to lie on a grid. We apply those layers in SchNet: a novel deep learning architecture modeling quantum interactions in molecules. We obtain a joint model for the total energy and interatomic forces that follows fundamental quantumchemical principles. Our architecture achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories. Finally, we introduce a more challenging benchmark with chemical and structural variations that suggests the path for further work.
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
Ouyang, Shuyi, Wang, Hongyi, Niu, Ziwei, Bai, Zhenjia, Xie, Shiao, Xu, Yingying, Tong, Ruofeng, Chen, Yen-Wei, Lin, Lanfen
The task of multi-label image classification involves recognizing multiple objects within a single image. Considering both valuable semantic information contained in the labels and essential visual features presented in the image, tight visual-linguistic interactions play a vital role in improving classification performance. Moreover, given the potential variance in object size and appearance within a single image, attention to features of different scales can help to discover possible objects in the image. Recently, Transformer-based methods have achieved great success in multi-label image classification by leveraging the advantage of modeling long-range dependencies, but they have several limitations. Firstly, existing methods treat visual feature extraction and cross-modal fusion as separate steps, resulting in insufficient visual-linguistic alignment in the joint semantic space. Additionally, they only extract visual features and perform cross-modal fusion at a single scale, neglecting objects with different characteristics. To address these issues, we propose a Hierarchical Scale-Aware Vision-Language Transformer (HSVLT) with two appealing designs: (1)~A hierarchical multi-scale architecture that involves a Cross-Scale Aggregation module, which leverages joint multi-modal features extracted from multiple scales to recognize objects of varying sizes and appearances in images. (2)~Interactive Visual-Linguistic Attention, a novel attention mechanism module that tightly integrates cross-modal interaction, enabling the joint updating of visual, linguistic and multi-modal features. We have evaluated our method on three benchmark datasets. The experimental results demonstrate that HSVLT surpasses state-of-the-art methods with lower computational cost.
Improving Molecular Modeling with Geometric GNNs: an Empirical Study
Ramlaoui, Ali, Saulus, Théo, Terver, Basile, Schmidt, Victor, Rolnick, David, Malliaros, Fragkiskos D., Duval, Alexandre
Rapid advancements in machine learning (ML) are transforming materials science by significantly speeding up material property calculations. However, the proliferation of ML approaches has made it challenging for scientists to keep up with the most promising techniques. This paper presents an empirical study on Geometric Graph Neural Networks for 3D atomic systems, focusing on the impact of different (1) canonicalization methods, (2) graph creation strategies, and (3) auxiliary tasks, on performance, scalability and symmetry enforcement. Our findings and insights aim to guide researchers in selecting optimal modeling components for molecular modeling tasks.
Generative Model for Constructing Reaction Path from Initial to Final States
Hayashi, Akihide, Takamoto, So, Li, Ju, Okanohara, Daisuke
Mapping out reaction pathways and their corresponding activation barriers is a significant aspect of molecular simulation. Given their inherent complexity and nonlinearity, even generating a initial guess of these paths remains a challenging problem. Presented in this paper is an innovative approach that utilizes neural networks to generate initial guess for these reaction pathways. The proposed method is initiated by inputting the coordinates of the initial state, followed by progressive alterations to its structure. This iterative process culminates in the generation of the approximate representation of the reaction path and the coordinates of the final state. The application of this method extends to complex reaction pathways illustrated by organic reactions. Training was executed on the Transition1x dataset, an organic reaction pathway dataset. The results revealed generation of reactions that bore substantial similarities with the corresponding test data. The method's flexibility allows for reactions to be generated either to conform to predetermined conditions or in a randomized manner.